人体戴的第一人称视觉(FPV)摄像头使从受试者的角度提取有关环境的丰富信息来源。然而,与其他活动环境(例如厨房和室外卧床)相比,基于可穿戴摄像头的eg中心办公室活动的研究进展速度很慢,这主要是由于缺乏足够的数据集来培训更复杂的(例如,深度学习)模型的模型在办公环境中的人类活动识别。本文提供了使用胸部安装的GoPro Hero摄像机,提供了三个地理位置的不同办公室设置中收集的大型公开办公活动数据集(BON):巴塞罗那(西班牙),牛津(英国)和内罗毕(肯尼亚)。 BON数据集包含十八个常见的办公活动,可以将其分为人与人之间的互动(例如与同事聊天),人对象(例如,在白板上写作)和本体感受(例如,步行)。为5秒钟的视频段提供注释。通常,BON包含25个受试者和2639个分段。为了促进子域中的进一步研究,我们还提供了可以用作未来研究基准的结果。
translated by 谷歌翻译
我们提出了一个分散的视图跨度识别框架,该框架在不需要参考3D地图的情况下自由移动的摄像机运行。每个摄像机都独立提取,汇总为层次结构,并随着时间的推移共享特征点描述符。通过视图匹配和几何验证来验证视图重叠,以丢弃错误匹配的视图。提出的框架是通用的,可以与不同的描述符一起使用。我们对公共可用序列以及使用手持相机收集的新序列进行实验。我们表明,与NetVlad,Rootsift和Superglue相比,在提议的框架内带有二进制单词的快速和旋转简短(ORB)特征,可提高精度和更高或类似的精度。
translated by 谷歌翻译
在水下图像中物体的外观通过选择性衰减而降低,从而减少对比度并导致颜色铸造。这种降解取决于水环境,并随着物体与摄像机的距离而增加。尽管水下图像增强和恢复中的作品数量增加,但缺乏普遍接受的评估措施正在阻碍进度,因为很难比较方法。在本文中,我们审查了常用的色彩精度度量,例如颜色复制误差和CIEDE2000,以及无引用的图像质量度量,例如UIQM,UCIQE和CCF,尚未系统地验证。我们表明,没有一项无参考质量措施令人满意地评估增强的水下图像的质量并讨论其主要缺点。图像和结果可在https://puiqe.eecs.qmul.ac.uk上找到。
translated by 谷歌翻译
我们的声音编码了一种独特的可识别模式,该模式可用于推断私人属性(例如性别或身份),即个人可能希望在使用语音识别服务时不会透露。为了防止属性推理攻击与语音识别任务一起,我们提出了一个生成的对抗网络Gengan,该网络综合了掩盖说话者的性别或身份的声音。拟议的网络包括一个具有U-NET体系结构的生成器,该发生器学会了欺骗歧视者。我们仅根据性别信息来调节发电机,并在信号失真和隐私保护之间使用对抗性损失。我们表明,与将性别信息视为保护性别的敏感属性相比,Gengan改善了隐私和公用事业之间的权衡。
translated by 谷歌翻译
主动位置估计(APE)是使用一个或多个传感平台本地化一个或多个目标的任务。 APE是搜索和拯救任务,野生动物监测,源期限估计和协作移动机器人的关键任务。 APE的成功取决于传感平台的合作水平,他们的数量,他们的自由度和收集的信息的质量。 APE控制法通过满足纯粹剥削或纯粹探索性标准,可以实现主动感测。前者最大限度地减少了位置估计的不确定性;虽然后者驱动了更接近其任务完成的平台。在本文中,我们定义了系统地分类的主要元素,并批判地讨论该域中的最新状态。我们还提出了一个参考框架作为对截图相关的解决方案的形式主义。总体而言,本调查探讨了主要挑战,并设想了本地化任务的自主感知系统领域的主要研究方向。促进用于搜索和跟踪应用的强大主动感测方法的开发也有益。
translated by 谷歌翻译
声学和视觉感测可以在人操纵时支持容器重量和其内容量的非接触式估计。但是,Opaquent和透明度(包括容器和内容的透明度)以及材料,形状和尺寸的可变性都会使这个问题具有挑战性。在本文中,我们向基准方法提出了一个开放框架,用于估计容器的容量,以及其内容的类型,质量和量。该框架包括数据集,明确定义的任务和性能测量,基线和最先进的方法,以及对这些方法的深入比较分析。使用单独的音频或音频和视觉数据的组合使用具有音频的神经网络的深度学习,用于分类内容的类型和数量,无论是独立的还是共同。具有视觉数据的回归和几何方法是优选的,以确定容器的容量。结果表明,使用仅使用Audio作为输入模块的方法对内容类型和级别进行分类,可分别获得加权平均F1-得分高达81%和97%。估计仅具有视觉视觉的近似接近和填充质量的容器容量,具有视听,多级算法达到65%的加权平均容量和质量分数。
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译
Humans have internal models of robots (like their physical capabilities), the world (like what will happen next), and their tasks (like a preferred goal). However, human internal models are not always perfect: for example, it is easy to underestimate a robot's inertia. Nevertheless, these models change and improve over time as humans gather more experience. Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change. In this work we take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality. Our key idea is to model the human's learning as a nonlinear dynamical system which evolves the human's internal model given new observations. We formulate a novel optimization problem to infer the human's learning dynamics from demonstrations that naturally exhibit human learning. We then formalize how robots can influence human learning by embedding the human's learning dynamics model into the robot planning problem. Although our formulations provide concrete problem statements, they are intractable to solve in full generality. We contribute an approximation that sacrifices the complexity of the human internal models we can represent, but enables robots to learn the nonlinear dynamics of these internal models. We evaluate our inference and planning methods in a suite of simulated environments and an in-person user study, where a 7DOF robotic arm teaches participants to be better teleoperators. While influencing human learning remains an open problem, our results demonstrate that this influence is possible and can be helpful in real human-robot interaction.
translated by 谷歌翻译
Explainability is a vibrant research topic in the artificial intelligence community, with growing interest across methods and domains. Much has been written about the topic, yet explainability still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that is a synthesis of what can be found in the literature. We recognize that explanations are not atomic but the product of evidence stemming from the model and its input-output and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation being a true description of the model's decision-making) and plausibility (i.e., how much the explanation looks convincing to the user). Using our proposed theoretical framework simplifies how these properties are ope rationalized and provide new insight into common explanation methods that we analyze as case studies.
translated by 谷歌翻译
Fruit is a key crop in worldwide agriculture feeding millions of people. The standard supply chain of fruit products involves quality checks to guarantee freshness, taste, and, most of all, safety. An important factor that determines fruit quality is its stage of ripening. This is usually manually classified by experts in the field, which makes it a labor-intensive and error-prone process. Thus, there is an arising need for automation in the process of fruit ripeness classification. Many automatic methods have been proposed that employ a variety of feature descriptors for the food item to be graded. Machine learning and deep learning techniques dominate the top-performing methods. Furthermore, deep learning can operate on raw data and thus relieve the users from having to compute complex engineered features, which are often crop-specific. In this survey, we review the latest methods proposed in the literature to automatize fruit ripeness classification, highlighting the most common feature descriptors they operate on.
translated by 谷歌翻译